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Abstract — We introduce a family of novel ranking algorithms 
called ERank which run in linear/near linear time and build on 
explicitly modeling a network as uncertain evidence. The model 
uses Probabilistic Argumentation Systems (PAS) which are a 
combination of probability theory and propositional logic, and 
also a special case of Dempster-Shafer Theory of Evidence. ER- 
ank rapidly generates approximate results for the NP-complete 
problem involved enabling the use of the technique in large 
networks. We use a previously introduced PAS model for citation 
networks generalizing it for all networks. We propose a statistical 
test to be used for comparing the performances of different 
ranking algorithms based on a clustering validity test. Our 
experimentation using this test on a real-world network shows 
ERank to have the best performance in comparison to well-known 
algorithms including PageRank, closeness, and betweenness. 



I. Ranking in Complex Networks 

Ranking nodes in complex networks is an important chal- 
lenge. Depending on the type of network and the application 
the meaning of a rank can be different. For the World Wide 
Web one is usually after popular and informative pages (e.g. 
Google). For a citation network it is influential papers, for so- 
cial networks (e.g. Facebook, Linkedln) it is central/important 
persons. More recently, networks are tools for calculating trust 
and transitional trust [1]. 

Algorithms applied today to large networks often rely on an 
intuitive idea (e.g. closeness or betweenness centrality [2]) or 
empirical results (e.g. eigenvector based algorithms such as 
PageRank [3]) but there is no clear and formal foundation as 
to why they actually work or how they are sound. 

When examining a network there is the implicit assumption 
that it encodes (some uncertain) evidence about the nature of 
the relations between the nodes. Quantitative reasoning under 
uncertainty is a prolific research field offering many methods 
and frameworks. 

Therefore one expects application of quantitative reasoning 
to the ranking problem, yet these are rarely used. There are 
different reasons for this. For example Bayesian networks [4] 
are restricted to directed acyclic graphs. An alternative is 
Dempster-Shafer Theory of Evidence (DST) [5]— [7] which 
enjoyed a recent surge of interest [8]. The adoption of DST 
based methods have been hampered because of the NP- 
complete complexity of the computations involved [9]. When 
one contemplates the application of a ranking method to large 
complex networks such as above, anything much higher than 
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linear time complexity can become virtually impossible to 
apply. 

In this work we bring forward a family of novel algorithms 
which we refer to as ERank. Our algorithms have linear and 
lower polynomial time complexities for quantitative reasoning 
specializing for the node ranking domain. ERank is based on 
Probabilistic Argumentation Systems (PAS) [10], [11] which 
are a way of combining propositional logic and probability 
theory. PAS can be mapped to the DST domain acting as a 
probabilistic way to interpret DST. 

Our effort can be viewed to have two phases; the construc- 
tion of a PAS instance to represent a network and the ap- 
proximation of calculations on that PAS instance. For the first 
phase, we will use a framework developed by Picard in [12]- 
[14] and rebrand it as a general PAS based network analysis 
tool, formalizing our approach in [15]. The end product of this 
phase is a PAS instance. It is a representation of a network in a 
quantitative reasoning system where one can perform ranking 
calculations. However as we will explore below, it turns out 
that it is practically impossible to do the exact PAS calculations 
required for ranking when a large network is examined due 
to the NP-complete complexity involved. Essentially, what is 
needed is a linear or near linear time algorithm when one 
considers such a task. 

In the second phase we introduce ERank as a means 
of approximating these complex calculations. ERank is a 
specialized approximation algorithm which works for the 
PAS instance mapped from a network such as above. It is 
an iterative algorithm building on the idea of propagating 
probabilities on the network and rapidly generating estimate 
results in linear/near linear time. 

We view to be an important part of the contribution of 
this article to be bridging the research in two different fields; 
ranking algorithms for very large networks and quantitative 
reasoning. We have strived to keep our text accessible to 
researchers from both directions. 

The remainder of this article will be organized as follows: In 
Section [TT] we will brief well-known and widely used ranking 
algorithms, present an overview of PAS limiting our focus to 
directly relevant parts. We will also introduce the Reuters news 
co-occurrence network [16] which will be our real world test 
bed throughout the article. Section|III]will show how a network 
is mapped to a PAS instance. Section [IV] will introduce and 
examine different aspects of the ERank algorithms. In Section 
[VI we will propose a method for comparing the performances 
of ranking algorithms on the Reuters network. We will then 
make a study of various well-known ranking algorithms com- 
paring them to ERank. Finally before concluding we will have 
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Section |Vl] exploring how different choices for parameters in 
ERank affect performance. 

II. Background 

A. "Importance " of nodes in complex networks 

"Importance" is a concept that is frequently met when 
dealing with complex networks but it is not always well- 
defined what is meant. Depending on the type of network it 
may mean popularity, reliability/reputation or authority among 
others. In this work we have used a variety of well known 
"centrality measures" which are also mentioned as "ranking 
algorithms". These give a measure of how important a node 
in the network is. 

Arguably the oldest of its kind, "citation count" is tradition- 
ally used in scientific literature both to asses the importance 
of an article and the authority of an author. Citation networks 
were shown to be small-world networks where citation count 
is simply the in-degree of a node in a citation network [17]. 

Two common measures of centrality are offered in complex 
networks literature; closeness and betweenness [2]. Closeness 
measures the shortest distance from a person to every other 
person. Here central nodes are the ones which are closest to 
all other nodes. Betweenness examines the extent to which a 
node is situated between others in a network. It is a measure 
of how much damage there would be to the connectivity if a 
given node is removed from the network. 

The famous ranking algorithm called PageRank [3] estab- 
lishes the importance of a web page for the Google search 
engine. Along with HITS [18], these two algorithms sparked 
interest in these kind of algorithms in the information retrieval 
community. PageRank originally builds on the intuition that 
while citation count is a reasonable attempt towards assessing 
the importance of a document it would be even better to "ex- 
tend" it to take the citer's importance into account. PageRanks 
are simply stationary probabilities for a "random surfer" on a 
directed graph who follows one random link at a time, and 
has a constant probability of making a random jump to any 
node. 

PageRank was conjectured to be a useful way of ranking 
pages and its success has been demonstrated in the success 
of Google. However judging the authority of a web page for 
evaluations can be a very difficult and costly task requiring 
questionnaires and manual evaluation. In a work by Borodin 
et al. [19] such an evaluation is done for PageRank and some 
other algorithms and PageRank was found not to perform 
better than citation count. 

Picard, whose PAS model for citation networks we gen- 
eralize and use in this article, suggests the use of PAS for 
popularity ranking instead of PageRank [13]. In this work, 
ranking using PAS is highlighted as a means of generating 
personalized ranks for each user. 

Recently in the "semantic web" concept the need to assess 
important nodes have surfaced again. In a survey of such 
works [1] we see that the ranking algorithms we mention 
(especially PageRank) or similar ones are used. 



B. Probabilistic Argumentation Systems 

We will be using Probabilistic Argumentation Systems (PAS) 
[10], [11] to model relations between different nodes in a 
network. PAS use a combination of probability theory and 
propositional logic building in turn on Dempster-Shafer The- 
ory of Mathematical Evidence (DST) [5]-[7]. As both PAS 
and DST are broad research topics on their own, we will only 
be concerned with the necessary parts. We believe Picard does 
a fine job of summarizing in [14] from which we will heavily 
borrow below. 

Despite what one might think, propositional logic is capable 
of expressing uncertainty. Propositions are normally used to 
express statements such as "it is sunny". A proposition can 
then take a truth value depending on the system modeled. Let 
us introduce a new class of propositions called assumptions. 
We will be using these to express uncertainty on propositions. 
Let vi be a proposition stating; "it will rain tomorrow", and 
a corresponding assumption oi. Consider the following: 

ai -> Vl 

We read it as; "if assumption ai is true then it will rain 
tomorrow", thus effectively "it may rain tomorrow". More 
complex relations can be expressed as propositional sentences, 
see Table H] for examples. 



TABLE I 

Knowledge representation in PAS. 



Type of 
knowledge 


Logical representation 


Natural language repre- 
sentation 


a fact 


Vl 


"»lis a fact" 


a simple rule 


Vi — > V2 


"vi implies V2" 


an uncertain 
fact 


ai — > vi 


"if assumption ai is true, 
then Hi is true" 


a simple un- 
certain rule 


a l — » ( v i — > v 2) 

equivalently 

ai A vi — > V2 


"if assumption ai is true, 
then vi implies V2" 



A Propositional Argumentation System is a triple (P, 

A, £) where P = {vi,V2, v n } is the set of proposi- 
tions, A = {oi, 02, a m } is the set of assumptions, and 
£ the knowledgebase. £ can sometimes be specified as a set 
£ = {£1, £2) £«} representing a disjunction of propositional 
clauses. Note that AOP = $. 

A hypothesis h is any logical formula of interest for us, 
with symbols in A U P. An argument is a conjunction of 
assumptions which is said to be in favor (or against) of h 
if with its assignment h becomes true (or false). Then the 
hypothesis h is said to be supported (or discarded) by the 
argument. The support of h with regard to £ is equal to the 
disjunction of all the arguments supporting h, and is denoted 

sp(K0- 

So far we have considered the qualitative aspect, it is also 
possible to introduce a quantitative judgment by using proba- 
bility assignments for assumptions. The quadruple PASp = 
(P, A, £, n) is called a Probabilistic Argumentation System 
(PAS), where LI represents the probability assignments for 
assumptions (e.g. n = [p(a 1 )...p(a m )] T where p(a.j) is the 
probability of a, being true). The probability distributions of 
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all the assumptions are assumed to be stochastically indepen- 
dent. Thus the probability of a clause is simply the multi- 
plication of the individual probabilities for the assumptions 
involved (e.g. for the case a\ = true and a 2 = false, 
p(oi A a 2 ) = p(ai)(l - p{a 2 ))). 

The quantitative value representing the support for an hy- 
pothesis is degree of support; denoted dsp(h, £). Simply put, 
it yields a value < dsp(h, £) < 1 which gives the posterior 
probability that the hypothesis is supported by the evidence. 

Note that an important feature of this kind of knowledge- 
base is that the dsp function is non-decreasing with additional 
evidence. Note also that when a given knowledgebase entails 
no contradictions the following equation holds [10]: 

dsp(h,0=p(SP(h,£)) (1) 

The dsp value corresponds to belief in the hypothesis in 
DST PAS represent a special case of DST, and make it 
possible to interpret belief probabilistically [10]. Thus dsp 
corresponds to the posterior probability that the hypothesis 
is true in the system. 

Example 1: Consider the following Propositional Argu- 
mentation System; assumptions A = {a\, a 2 , a 3 }, propositions 
P = {vi,v 2 }, and the knowledgebase £ = {^1,^2,^3} where 
fi : ai — > vi 
£2 : a 2 — > v 2 
£3 : v 2 -»• (a 3 -> Vx). 

If our hypothesis is h = v\, the support for h is the 
disjunction of all the arguments which make v 1 true. After 
examining the rules above we can see that SP(h, £) is: 

5P(fe,0 = fliV( ft2 A fl3 ) (2) 

Using an alternative notation SP(h, £) = {a l7 a 2 Aa 3 }. 

Let the probability assignments for the assumptions be; 
p(ai) = 0.6, p(ci2) = 0.3, and p(a 3 ) — 0.2. We already know 
the supporting arguments for the hypothesis v\ . However, we 
can not simply add the corresponding probabilities because 
they have to be made disjoint first: 

dsp{ Vl ,0 = p(SP(v u 0) 

= p{ai V (a 2 A a 3 )) 

= p(ai) +p(-iai A a 2 A a 3 ) 

= P(ai) + (1 - P(ai)) ■ p(a 2 ) ■ p{a 3 ) 

= 0.6 +(1-0.6) -0.3 -0.2 

= 0.624 

C. Co-occurrence Network of Reuters News 

We will be using the co-occurrence network of Reuters 
news [16] as a test network for our algorithms. We will be 
analyzing the "importance" of the persons in this network. It 
is constructed using the Reuters-21578 corpus which contains 
21578 Reuters newswire articles which appeared in 1987, 
mostly on economics. This is a network with 5249 nodes 
and 7528 edges, where nodes represent individual people 
and there is an edge between two persons if they appear in 
an article together. We chose to use edges as unweighted. 



These people are often well-known or powerful people of 
their time in politics or business. It was shown in [16] this 
network exhibits small-world properties, presented along with 
a study of different well-known ranking algorithms. We use 
a converted version of this undirected network to a directed 
network by using two arcs in both directions in place of an 
edge. The diameter of the undirected network is 13. 

III. Using PAS to Model Network Relations 

PAS for network analysis were initially used to model 
and analyze citation networks [12]— [14]. In these works the 
main problem is enhancing the performance of information 
retrieval with regards to relevance. Picard introduces a PAS 
based framework to model network relationships between 
documents. We will be using this model only generalizing 
it as a general network analysis tool. We have formalized 
our approach in [15]. Simply, the model no longer models 
documents and hyperlinks on documents, but it can be nodes 
and links of any network. We introduce the concept of a 
transitive relation to establish the context of the analysis. 

For example, if we want to model the spread of a contagious 
disease, then the links could represent the infection probabil- 
ities between individuals and the node assumptions would be 
the initial probabilities that a given individual in the population 
is already infected. In this setting, the degree of support for 
a given node proposition would give the posterior probability 
that a given person is sick given the relations structure between 
individuals. When analyzing the importance of persons in a 
social network then our transitive relation could be "(if person 
A is linked to person B then) person B is influenced by 
person A", for WWW it can be "(if page A links to page B 
then) page B is found important/informative by page A". The 
mathematical model is not affected as long as the relation is 
transitive. It is debatable what constitutes a transitive relation 
especially in a social setting. For example, if a person (A) is 
influenced by another (B) who in turn is influenced by a third 
person (C) it is nevertheless possible (A) and (C) do not know 
each other. We can still consider this a transitive relation for 
this model, if (C) can indirectly influence (A) by influencing 
(B). It is possible to see how this would happen if there is 
absolute trust involved. The PAS model is capable though of 
handling a lower level or uncertain level relation. 

A network is mapped into a PAS instance PASp = 
(P, A, £,II). Each node i has a corresponding proposition 
Vi G P and an assumption a,; G A. The link from node i 
to node j has the link assumption Uj G A. The assumptions 
represent the chosen transitive relation. Then the knowledge 
base £ consists of the disjunction of the following forms: 
a,i — > Vi : for each node i 

(vi A lij) — » Vj : whenever there is a link from node i to j. 

The knowledge-base in this model is made of Horn clauses 
(i.e. sentences of the type a A b A c A ... — > z). Finding out 
the support SP(vi) can be identified as an inference (argument 
finding) problem and is known to have linear complexity [20]. 
Also it entails no contradictions, so EqQ] holds. 

Example 2: Consider the simple network in Fig j 1 (a)| The 
knowledge-base £ for this network is given below: 
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(b) 

Fig. 1. (a) A simple network, (b) Corresponding PAS graph. 





ai 


-> Vi 






a2 


-» «2 






03 


-» W3 






(V2 


A/ 2 l) 


— > «1 




{V2 


A/ 23 ) 


-> V 3 


£ 6 


{V3 


A/31) 


-> Vi 



Using logical inference on £ we can find the set of support- 
ing arguments for v\. Note the reach of support of v 2 to v\ 
via v 3 . 

SP{<m) =a x W (a 2 A hi) V (a 2 A / 23 A Z31) V (a 3 A Z31) 

(3) 

Now consider the same network on Fig j 1 (b)| this time also 
showing the propositional symbols. The circle nodes represent 
node propositions Vi, and the square nodes represent node 
and link /^ assumptions. Note how the inference process for a 
given node is reminiscent of walking backwards on the graph 
from the node. 

As proven in [15] the general formulation of support for a 
given node's proposition Vi is: 

SP(Vi) = a, V \/ {SP(vj) A Iji) (4) 
jePi 

where Pi is the set containing the parent nodes of i. The 
inclusion-exclusion rule is useful for evaluating this kind of 
expressions: 

p{a\Jb) = p(a) + p(b) — p(a A b) 

where a and b are propositional sentences. If a and b are 
disjunct it becomes: 

p(a\/b) = p(a) + p(b) - p(a)p(b) 
= l-(l-p(o))(l-p(6)) 



Example 3: Now let us look at the quantitative aspect of 
the previous example. We will use the short form dspi for 
dsp(vi). Before we can calculate dsp\, the expression in EqO 
needs to be made disjoint. Below is one way to do it (dropping 
As for convenience): 

SP{vi) = ai V a 2 (/ 2 i V /23/31) V a 3 / 3 i 
= ai 

V^aia 2 (Z21 V Z 23 /3i) 
V^ai-ia 2 a 3 Z 3 i 

This sentence is disjoint except the expression in the middle 
which includes the disjunction of two (disjunct) clauses. Using 
the inclusion-exclusion rule: 

dspi = p(ai) 

+ {l-p{a 1 ))p(a 2 ) (p{ki) +p(l23)p(hi) ~ v(hi)p(h?,)p{hi)) 
+ (1 -p(ai))(l - p(a 2 ))p(a 3 )p(l 31 ) 

Let us use the values j?(ai) = p(a 2 ) — p{a 3 ) = 0.3, and 
p(/2i) = p{hi) = p{hz) — 0.5. Inserting these above gives 
dspi — 0.5047. Using the infection interpretation, when there 
is a 0.3 probability of "infection" on each node, node 1 has 
a higher posterior probability 0.5047 to eventually catch the 
disease, which is what we expect to see. 

Making an expression disjoint is in fact an NP-complete 
problem as it involves the satisfiability problem (SAT) which 
is a well-known NP-complete problem [21]. So, although 
finding SP(vi) of node i is relatively easy with O(N) 
complexity, finding dspi can be prohibitively expensive. The 
basic way to calculate the probability of an expression is to 
apply the inclusion-exclusion rule repetitively which creates 
an exponential number of sub-expressions. There are however 
more efficient algorithms, such as the Heidtman [22] algorithm 
or algorithms which make use of binary decision diagrams 
(BDD) [21], [23], but the problem remains NP-complete. 

IV. Approximating PAS on Complex Networks 

We have shown that the exact degree of support calculations 
for PAS have non-polynomial complexity. Considering that the 
number of nodes affecting a node's rank can be as large as 
all the nodes in a complex network, for many networks it is 
practically impossible to calculate the exact dspi values. 

One possible way to control the complexity is to limit how 
far one goes back in the network for collecting support. We 
will use the term maximum order of a supporting argument 
to refer to the number of link assumptions in the argument, 
as introduced in [14]. For example, in Example |2] SP(vi) 
contains one supporting argument with 2 link assumptions 
(fl2 A /23 A Z31) and two others with only 1 link assumption 
(a 2 A l 2 \ and a 3 A /31). Therefore the maximum order is 2. 

Even calculations with a maximum order of 2 can be very 
difficult. Consider a citation network, for a paper we would 
have to consider the immediate citations, and then the citations 
to the citers. A paper can get more than 1000 citations and 
the citing papers may have citations to them. This would 
correspond to including the contributions of thousands of 
different papers in a dsp calculation. We have used a BDD 
based implementation [15] for exact dsp calculations and 



we found that this calculation is impossible within realistic 
time/space limits. In [14] this is also reported as a problem 
where the author suggests use of a maximum order of 1 (using 
only immediate citers) where a higher order is not possible. 

Although highly optimized algorithms in the future might 
get round to make such a calculation it is certainly not an easy 
task. Secondly, such a calculation with a maximum order 2 
would fail to capture a more global picture in the network. 
Recall that one of the motivations behind the introduction of 
PageRank [3] was this. 

For having a realistic chance to be applicable to ranking in 
very large complex networks an algorithm needs to have linear 
or close to linear time complexity and ideally utilize only 
local information to a node. In this section we will formulate 
such an algorithm. The ranking process will be viewed as 
a propagation of node probabilities over links in an iterative 
algorithm. There are two main challenges to consider, namely 
overestimation and cycles. 



Overestimation 

We can make an exact calculation using only local informa- 
tion for a node if the supports of the citer nodes are disjoint. 
If we assume them to be disjoint when they are not, then we 
would overestimate the degree of support. Let us detail this 
with an example. Consider Fig |l(a)| the neighbors of node 1 
are nodes 2 and 3. We know from Eq0]the support for v\ is: 

SP( Vl ) = a x V {SP(v 2 ) A hi) V (SP(v 3 ) A Z 31 ) (5) 

If we assume SP(v2) and SP(v 3 ) to be disjoint then we get 
dspi as below: 

dsp[ = 1-(1 -p(oi))(l -dsp 2 p(hi))(1 -dsp 3 p(l 31 )) 

where we use inclusion-exclusion rule as in Eq|5] Using the 
values from our example we see that dsp[ = 0.5255 compared 
to dspi = 0.5047. Note the values are rather similar, and the 
difference is made by the overestimating of the effect of node 
2. 

This leads us to formulating the common conjunction 

model which uses a damping function d c (vi) to discount the 
possible effects of overestimation: 



0- 



dsp\ = l-(l-p(a 4 )) 
1 - dc{vi) 1 



is -Pi 



dspj p(lji)) 



This is equivalent to doing a partial transformation on 
the immediate neighbors of a node, and accounting for the 
previous "entanglement" using an extra "damping" node, see 
Figf2]for a demonstration of the idea. 

Recall that for small-world networks [17] it is shown that 
if vertex i is connected to vertex j and vertex k, then it 
is highly probable that vertices j and k are also connected. 
Damping function is therefore used to counter the effect of 
the clustering. 




Fig. 2. Transformed graph as seen by node 1. 



We now formulate our first approximation method we name 
ERank-0 as below: 



dsp z 



-fc+i 



i-(i-KO) 

1 - d c ( Vi ) 1 



dsp j p(lji)) 



where dsp i is the dsp estimate for node i at iteration k with the 

-0 — ~k 

initial condition dsp i = 0. ERank0(i) = dsp i for a chosen 
number of iterations k. We can think of this as a series of 
approximations based on how far we go back in the network 
to look for support. 

ERank-0 produces gradually better estimates after each 
iteration. We typically use d c (vi) = do where c?o is chosen 
to minimize an objective function for a sample set of nodes 
in the network. 

For Fig |l(a)| we see for example that using do — 0.95 
after three iterations ERankQ(v{) = 0.5127, which is higher 
than the exact value but lower than what would be the if 
SP(v2) and SP(v 3 ) were disjunct. We explore the effects 
of the damping values later on this section. 




Fig. 3. A simple network with a cycle. 



Dealing with cycles 

ERank-0 is prone to deterioration of ranks in the presence of 
cycles between nodes. This effect is stronger with immediate 
cycles but still present when indirect cycles are present. 

We formulate higher-order algorithms which avoid feedback 
for a given maximum number of links between nodes. Based 
on how many links they avoid the feedback, they are named; 
ERank-1 (avoids feedback between immediate neighbors, i.e. 
one link), ERank-2 (avoids feedback between nodes separated 
by another node, i.e. two links), or arbitrarily higher. ERank- 
has no such avoidance hence the "0" in the name. We 
also use ERank-N to refer to avoidance of feedback from 
any possible length of links. These higher-order algorithms 
(ERank-1 and above) use a message-passing scheme to avoid 
feedback from cycles by keeping a set of nodes which have 
already contributed to a calculation. Further details regarding 
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ERank-0 (it=3) 
ERank-0 (it=12) 
ERank-1 (It=3) 



0.05 



I 0.03 



0.02 




Fig. 4. Average distance for various algorithms on Fig[3] 



ERank-N can be found in [?] and [15]. Also, in [15] we offer a 
formal treatment of the theoretical framework presented here, 
introducing the Entity Transitive Relation Implication (ETRI) 
model for the mapping of a network into a PAS instance. In 
this previous work we present ERank as a special case tailored 
for the network ranking application of a general case algorithm 
named ETRI Support Propagation (ESP). However we chose 
to use ERank throughout this article for the sake of simplicity 
also omitting other details that are not crucial. 

For example in Figj3] nodes 1 and 2 have an immediate 
cycle between them. Figj4] shows how ERank-0 and ERank-1 
perform when run on the network of Fig|3] It plots the average 
distance for a given iteration: 



i=l 



dsp i — dspi 



(6) 



In this figure, we plot the results when ERank-0 is run for 3 
iterations, and when it is run for 12 iterations. For comparison 
we also plot the results from ERank-1 at 3 iterations. 

We observe ERank-0 algorithms with different iterations do 
comparably well, while ERank-1 outperforms others when d 
is chosen correctly. 

In our experimentation with the Reuters network we have 
not seen any significant improvements in estimation per- 
formances or ranking performances (as we introduce later) 
using these "higher" algorithms. This is probably because the 
Reuters network is undirected although we have not confirmed 
this. So we will not deal with the other ERank algorithms any 
further in this article due to space considerations. 



Assigning node and link assumption probabilities 

For applying ERank algorithms in particular, and PAS based 
ranking/analysis in general one needs to assign prior probabil- 
ities to assumptions. We will deal with the two different types 
of assumptions in the network mapped PAS knowledgebase; 
node and link assumptions. 



For the network of infection, the probability of the node 
assumption corresponds to the prior probability that an indi- 
vidual is infected. The probability of transmitting the infection 
is represented by the link assumption probabilities. 

If such prior probabilities for a relation in the network 
are known they may be useful. Lack of such data does not 
make the analysis impossible though. In this work we will use 
p{di) = 1/n where n is the number of nodes. In the evidence 
theory (DST) interpretation, this corresponds to assuming that 
at least one node in the network has the analyzed property. 
It can be thought of as a minimal evidence or the most 
conservative assumption to make about the network before 
analyzing it for a property. 

If prior link probabilities are not known, we can not offer 
a similarly simple assignment for link probabilities. Instead 
a range of values, such as conservative estimates depending 
on the relation can be used as we will show below. We use 
p(hj) = Pio for a ll hj where pio is a model parameter and 
various values of it are investigated. 

When applying ERank algorithms on the Reuters network 
we will use the transitive relation: "(if person A links to B) 
person B is influenced by person A". So, we will interpret 
our results to yield the posterior probability of a person being 
influential. 

ERank algorithms for approximating dsp values 

For successfully applying ERank algorithms, one needs to 
choose the number of iterations to run and what damping 
function or constant to use. 

Let us use u to denote the number of iterations. For ERank-0 
for a given l the corresponding maximum order approximated 
is l — 1. It is not hard to see how this is. Each iteration 
after the first one generates approximations for an additional 
order of support compared to the previous iteration. Therefore 
the highest number of potentially useful iterations is limited 
with the diameter of the network. Using additional iterations 
do not necessarily create better approximations though and it 
depends on the structure of the network what value number of 
iterations is the most suitable. A way to decide on an l is to 
take into account what the maximum contribution a supporting 
argument of the corresponding order would be, and if there are 
significantly many supporting arguments to make a difference. 
For example, when the algorithm is run for 6 iterations than the 
maximum order of corresponding supporting arguments is 5. 
Assuming pio = 0.2 gives 0.2 5 = 3.2 • 10~ 4 as the maximum 
contribution a supporting argument of order 5 would give, 
compared to 0.2 for immediate neighbors of a node. Note also 
that it is known in the small-world network model the average 
of the distances between nodes is unusually low compared to 
a random network [17]. This can serve to limit the maximum 
number of iterations needed even for a very large network. 

In this work we use a constant damping function do al- 
though it is possible to come up with a different heuristic 
function. The choice of the damping constant relies similarly 
on the structure of the network. In this section we will use 
Eq|6] as an objective function and plot different approximation 
results using it. 
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As we have argued earlier, the exact dsp value of a node 
may be prohibitively hard to compute. On the Reuters network 
we have been able to compute the exact dsp values of nodes 
up to different maximum orders ranging from one (just the 
immediate neighbors) to 11. We use as many as possible of 
these as sample sets to plot the average distance using Eq|6] 
For example when comparing against ERank-0 run with 6 
iterations, we use all of the sample set for which we could 
calculate the dsp values using the corresponding maximum 
order of 5. We do not include nodes without any links in 
these calculations. 

In Fig [5] we consider the average distance on the Reuters 
network where comparisons are made against dsp calculations 
with a maximum order of 3. It contains the plots of ERank-0 
forp/o = 0.2 and p(di) — 1/n using 3 and 4 iterations for the 
damping constant range [0, 1] along with corresponding dsp 
computations using maximum orders of 1 and 2. The results 
are offset in reference to dsp with maximum order 3 which is 
represented by the line y = 0. We observe that when ERank- 
has a good damping constant it can outperform exact dsp 
calculations of maximum order 2. 

Similarly, in Fig. [6] we use the same probability values as 
in FigO to compare how different ERank's perform on the 
Reuters network. Using Eq|6]we plot ERank results comparing 
them to dsp computations with a maximum order of 5. ERank- 
appears here to perform as good as the higher order ERank 
algorithms. As we have argued above we believe this is 
because the conversion from undirected to directed network 
places cycles for all the nodes although we have not validated 
this yet. 

Finally, observe that when computing ranks for ERank- 
one calculation is made over every link per iteration. So 
ERank-0 has a linear time complexity 0(1) with the number 
of links I per iteration. 




Ql 1 1 1 1 1 

0.2 0.4 0.6 0.8 1 

d o 

Fig. 5. ERank-0 and dsp computations approximating dsp computations of 
maximum order 3 which is represented by y = 0. 
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Fig. 6. Comparison of different ERank algorithms corresponding to a 
maximum order of 5. 

V. A Performance Evaluation of Ranking 
Algorithms 

In this section we will propose a method to compare the 
performances of different ranking algorithms on the Reuters 
network and then present a study of the performances of a 
number of well-known algorithms comparing them to ERank 
algorithms. 

A. Assessing importance of nodes 

We will link the importance of a person in 1987 to impor- 
tance today. We will see how well a person in the Reuters 
collection is represented in today's English Wikipedia and 
compare that with the rankings. Part of this study appeared 
before in [16]. 

For assessing the validity of our results we have used a 
crawler to look up if a given person has an English Wikipedia 
page [24]. We have interpreted this as an indication that a 
given person is important today in a general global sense. 
This would have an English speaking world bias and may not 
necessarily be a truly objective measure. However Reuters also 
being an English source and English being the closest there 
is to a truly global language, this measure should function at 
least to a reasonable extent. Our basic assertion here is that if 
a person was important back in 1987 when the Reuters articles 
were being published, then s/he would still be important today. 
The 20 years passed since then can make a "time's judgment" 
on who were truly important at the time. It is possible however 
other people in those articles unimportant or unforeseeable at 
the time will have gained importance. Similarly some who 
were not very important from a Reuters reporting perspective 
can actually be important individuals for different reasons. 
Combined, these would mean that the assessment power of 
the algorithms would be limited in discovering all those who 
are important, however this analysis should be reasonably 
good enough to penalize "false positives" which the algorithms 
would mark as important but were really not as such. 

Using the crawler results we have constructed the function: 
"has a page" H(i) which is 1 if there is any Wikipedia page 
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for a given person i, otherwise. Of the 5,249 persons in the 
network we find that 1,440 have a Wikipedia page. In the rest 
of this section we will use this function as apriori information 
on the importance of nodes and perform a comparative study 
of the algorithms. Table [TT] shows the top 20 people when 
ranked according to article count values. Having a glance at 
this table can serve as a basic reality check for the utility of our 
defined functions. For example we see that most of the people 
we could expect to have high importance have H(i) = 1; 
President of USA, Prime Minister of Japan, Secretary of State 
of USA. 

TABLE II 
TOP-20 PERSONS IN ARTICLE COUNT. 



person 


a. count 


H(i) 


notes 


r.reagan 


493 




President 


j. baker 


212 




Treasury Secretary 


y.nakasone 


112 




Prime Minister, Japan 


p.volcker 


109 




Ch. Fed. Resv. Board 


k.miyazawa 


86 




Finance Minister, Japan 


c.yeutter 


85 




Trade Representative 


n.lawson 


66 




Chan. Exchequer, UK 


d.funaro 


58 




Fin. Minister, Brazil 


r.lyng 


57 




Agriculture Secretary 


g.stoltenberg 


55 




Fin. Minister, W.Germ. 


g.shultz 


50 




Secretary of State 


m.thatcher 


50 




Prime Minister, UK 


e.balladur 


48 




Fin. Minister, France 


j. wright 


47 




W.H. Speaker, Texas 


s.sumita 


44 




Bank of Japan Gov. 


m.baldrige 


42 




Commerce Secretary 


m.fitzwater 


40 




W.H. Speaker 


a.greenspan 


39 




Ch. Fed. Resv. Board 


j.ongpin 


36 





Fin. Seer., Philippines 


j.sarney 


36 


1 


President, Brazil 



Performance as clustering validity 

The function H(i) can be thought as placing each node in 
one of the two classes and 1, i.e. those with and without 
English Wikipedia pages. Hence this becomes a clustering 
problem with an external criteria. We would ideally like an 
algorithm to rank all the persons labeled as H(i) = 1 higher 
than the ones labeled with 0, thus giving us a perfect separation 
of the collection into two clusters. There is a well-known 
statistic named "Hubert's gamma" which is used for assessing 
cluster validity in this class of problems [25]. Mathematically 
stated Hubert's gamma is: 



where 



n— 1 n 

r = E E x(i,j)Y(i,j) 

i=l j=i+l 



v ' 1 otherwise 



(7) 



(8) 



and X(i,j) is the distance between the two nodes. X(i,j) 
is usually the Euclidian distance on the ranks. Let us use 
p(i) to denote the rank value given to node i by the ranking 
algorithm p. Then the Euclidian distance function is: X(i,j) = 
\p(j) ~ The r statistic measures the degree of linear 

correspondence between the entries of X and Y. 



The power of a statistical test is in establishing how unusual 
a given ordering is. To do this we come up with a null 
hypothesis Ho which is a statement of "no structure". The 
Hq for r is called the "random label hypothesis"(RLH) which 
postulates that all permutations of the labels on n objects are 
equally likely. We establish a distribution for Hq using Monte 
Carlo sampling creating random permutations of node labels 
on our collection (we shuffle the node labels and calculate 
corresponding Ts). For T, the higher the value the more likely 
that a given labeling is unusual. We use the RLH distribution 
to compare with the Ts obtained from our algorithms, and if 
we find these Ts to be unusually large then we can conclude 
the algorithm is successful. 

Since we wish also to compare the performances of the 
different algorithms, we have used the positions assigned by 
the algorithms to a node instead of the rank values. This 
way we make the T values obtained directly comparable. For 
example X(i,j) would be defined as 

X(i,j) = \Pos p (j)-Pos p (i)\ 

where Pos p (i) is the position given by the algorithm to node 
i according to p. This however brings another problem when 
ranking algorithms assign the same rank value to a large set of 
nodes: two nodes with the same rank can have positions which 
are far apart thus being ranked very differently in terms of 
positions despite being equivalent in actual ranks. To overcome 
this problem we did a random sampling of different orderings 
in which nodes with equal rank values are shuffled into random 
positions between each other for each calculation of V. This 
for example then gives our distance function X(i, j) for p as: 



X(i,j) = Pos p (j) - Pos p (i) 



(9) 



where Pos p (i) is the average value of Pos p (i) obtained after 
the random sampling. 

Hubert's T combined with the H (i) thus gives us a statistical 
test to compare the performances of any ranking algorithm on 
the Reuters network. 

Performance results 

We have run the ERank algorithms ERank-0, ERank-1 and 
ERank-2 on the Reuters network. We use the results from 
following algorithms to compare: 

• Article count, is the number of articles a person appears 
in. 

> Degree is the number of people a person got associated 
with in the collection, i.e. the link count on the node (in 
the undirected network). 

• Closeness, calculated using the undirected unweighted 
network. 

« Betweennes, calculated using the undirected unweighted 
network. 

• PageRank, is the PageRank of a node using d = 0.5. 
For application we have converted the undirected network 
to directed by replacing each edge with arcs in both 
directions. 

The Ts for all the algorithms are on Table IllTl these and later 
results on the figures are obtained averaging the calculations 
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of 100 samples. Fig. Q gives how the T values for the RLH 
and the algorithms relate. For this experiment we have used 
10000 samples for calculating the RLH distribution assigning 
them to 40 bins in an histogram. 

TABLE III 
TS FOR DIFFERENT ALGORITHMS. 



algorithm 


r 


parameters 


a. count 


9.974 ■ 


10 uu 






degree 


9.921 ■ 


10 09 






betweenness 


9.894 ■ 


10 09 






closeness 


1.002 • 


10 10 






PageRank 


9.760 ■ 


10 09 


d = 0.5 




(1) ERank-0 


1.003 • 


10 10 


i = 6, pi = 0.2, do = 


0.7 


(2) ERank-1 


1.003 ■ 


10 10 


l = 3, pi = 0.2, d = 


0.8 


(3) ERank-2 


1.003 • 


10 10 


i = 2, pi = 0.2, d = 


0.9 


(4) ERank-0 


1.004 ■ 


10 10 


i, = 12, pio = 0.1, d = 


0.3 


mean RLH 


9.599 ■ 


10 09 







800 




9.5 9.6 9.7 9.8 9.9 10 10.1 



Fig. 7. Ts for algorithms and the RLH. 1.001 - 1.005 X 10 10 region is 
expanded in the inset. 

We find that all the algorithms in fact give a valid clustering 
as the Ts produced by the algorithms are higher than the whole 
sampling collection for the RLH. For Monte Carlo sampling, 
when to is the sample size, and if Tq is among the k largest 
of the to values in the sample set, then the probability of 
incorrectly rejecting Hq when it is true is a = k/m. k is 
usually chosen higher than 5 [25], so for this experiment using 
to = 10000 and k — 10 we get the level of significance as 
a = 0.001 which is a high confidence level. 

It is not a surprise that all the algorithms yield a valid clus- 
tering given that these are widely used in different applications. 
However we can distinguish between the comparative perfor- 
mances of the algorithms statistically, as to how unusually 
good their given results are. We observe that when accordingly 
parameterized ERank outperforms all other algorithms. 

VI. Choosing ERank parameters 

A successful application of ERank depends on choosing 
various parameters. Firstly, for constructing the PAS instance, 
one has to choose the probabilities of assumptions; p(a,i) 



and p(hj) based on the transitive relation used. Then, a 
damping function (e.g. the constant damping function do) and 
the number of iterations l has to be chosen. All of these 
have complex interactions and it is not always clear how 
they relate to each other and the algorithm performance in 
general. In this article, we have employed a constant node 
assumption probability function p(ai) = 1/n and a link 
assumption probability function p(hj) — pio, along with the 
constant damping function do- In this section we will briefly 
explore how these different parameters interact and affect 
the algorithm performance as indicated by T in the Reuters 
network. 

In Fig. [8] we see how different pio values affect V values 
for different do values using ERank-0. As can be seen, some 
Pio values result in a wider range of do values where high 
Ts are obtained. The optimal do values are much lower for 
the T calculation as compared to what is discovered in the 
approximation section (e.g. for pio — 0.2). This may be a shift 
due to the change in the objective function and the use of posi- 
tions and not actual values. Also the nodes in the dense areas 
of the network may shift the average clustering to a higher 
degree. Another observation is how the results are robust for 
a range of do and pio choices. Fig. [9] shows how different 
ERank algorithms yield results. In line with the approximation 
results, ERank-0 is the best performer by a small margin. 
Finally Fig. [10] plots how Ts change by increasing iterations 
for different do values. Usually the V values start dropping 
around iteration 4-8, however an interesting observation here 
is that do = 0.15 appears unnaturally stable. This may be 
because of do compensating also for immediate cycle effects. 
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Fig. 8. Ts using ERank-0 for different pio and do values using t = 6. 



VII. Conclusion 

We have introduced a family of novel rapid approximation 
algorithms for applying a PAS based modeling and ranking 
to large complex networks (particularly small-world model 
networks). As far as we are aware, it is the first of its kind that 
is both practically applicable to large networks and formally 
founded in a quantitative reasoning framework. A problem 
known to be NP-complete is approximated using linear and 
near linear time algorithms for this specialized application 
domain. Thus ERank enables the use a new paradigm in 
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Fig. 9. Ts using different ERank-N algorithms for p ;0 = 0.1 and 0.1 < 
do < 1.0 corresponding to a maximum order of 5. 



10.04 




10 15 20 

Number of Iterations 



Fig. 10. Ts using ERank-0 for p;o 
for different do values. 



0.1 with increasing number of iterations 



addition to the Markov (random surfer) model for ranking 
probabilistically. 

We have explored various issues for a sound application 
of the algorithm on the Reuters [16] network. These include; 
the choice of a damping function, assigning the prior node 
and link assumption probabilities and choosing the number of 
iterations. 

We propose a statistical test to compare the performances 
of any ranking algorithm on the Reuters network using a 
clustering validity test. We apply a number of well-known 
algorithms and compare their results with ERank algorithms. 
When ERank algorithms are parameterized accordingly, they 
perform better than the other algorithms. An unexpected 
finding was that PageRank was the worst performing of the 
algorithms considered (more on this in [16]). This may be 
related to the conversion from the undirected network to 
directed. 

Our experimentation reports good performance for a wide 
range of parameters. This is good in the sense that ERank 



appears to be robust. Also, it is possible to interpret this as 
the test not being able to distinguish performance results above 
a certain precision or threshold, although it was good enough 
to uncover performance differences between the various algo- 
rithms. 

The superior performance of ERank may be attributed to a 
global character present in the final ranks. For example in 
a given network, a node in a "dense" area will surely be 
ranked highly despite possibly very intricate details of linking 
between the nodes. Once the obvious source of distortions are 
removed (e.g. immediate cycles) and an expected clustering 
is accounted for (i.e. the damping function) the "big picture" 
can be obtained correctly despite many possible distortions. 

ERank as we apply it, is susceptible to various sorts of 
manipulations as a ranking algorithm. For example it would 
not be able to discover an unusual overestimation caused by a 
high rank source behind a facade of immediate neighbors. This 
is by design, that we have used a constant damping function. 
One may need to come up with a better heuristic function 
or a combination of exact and approximate algorithms can be 
used. On the other hand, it is a global ranking algorithm like 
PageRank and would have resistance to manipulation in this 
sense. Therefore testing its robustness against manipulation is 
a possible future research direction. 

A problem with this experimentation is the conversion 
from undirected to a directed graph. While interesting as an 
experimentation on an (essentially) undirected graph, using the 
Reuters network we were not able to test our algorithms on 
a truly directed network. It remains as future work to apply 
ERank on a truly directed graph and evaluate performance 
against apriori information. On such a graph we would expect 
ERank-N with N > to outperform ERank-0. 

Also as future work, it would enhance the reliability of the 
prior information to include information from Wikipedias of 
different languages, as well as using other references sources. 

What we present here attempts to nominate ERank as a 
good algorithm for at least some ranking applications. Possibly 
much more needs to be done to establish how different ranking 
algorithms including ERank compare with each other for 
different applications. In this regard, given ERank's theoretical 
soundness and the superior performance in this experimenta- 
tion, we hope to stimulate further research and interest in this 
direction. 
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